ggml-cpu : optimize RVV kernels #15720

xctan · 2025-09-01T16:52:26Z

This PR introduces performance optimizations for some RISC-V kernels and expands hardware support by enabling half-precision extensions.

Using the perf profiler, I identified significant performance bottlenecks caused by pipeline stalls. The following 128-bit RVV kernels have been rewritten to resolve these issues:

ggml_vec_dot_q4_K_q8_K
ggml_vec_dot_q6_K_q8_K

To facilitate intermediate results using half-precision floats, this PR enables the zvfh extension and adds implementations for several performance-critical kernels.

xctan · 2025-09-01T17:05:35Z

Performance

Benchmark model: unsloth/Qwen3-4B-Instruct-2507-GGUF (ModelScope Hugging Face)

model	size	params	backend	threads	test	t/s	%	branch
qwen3 4B Q4_K - Medium	2.32 GiB	4.02 B	CPU	64	pp512	68.40 ± 0.74	183%	PR
qwen3 4B Q4_K - Medium	2.32 GiB	4.02 B	CPU	64	pp512	37.30 ± 0.35		master

qwen3 4B Q4_K - Medium	2.32 GiB	4.02 B	CPU	64	tg128	20.24 ± 2.45	177%	PR
qwen3 4B Q4_K - Medium	2.32 GiB	4.02 B	CPU	64	tg128	11.41 ± 0.77		master

Validation

Test model: unsloth/Qwen3-0.6B-GGUF (ModelScope Hugging Face)

llama-perplexity -m Qwen3-0.6B-Q4_K_M.gguf -f wikitext-2-raw/wiki.test.raw

branch	perplexity
master	22.8862 +/- 0.20017
PR	22.8811 +/- 0.20010

…upport * origin/master: (72 commits) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) vulkan : update ggml_vk_instance_validation_ext_available (ggml-org#15666) ggml vulkan: add hardsigmoid and hardswish operations (ggml-org#15762) CUDA: Optimize `rms_norm_f32` kernel and its fused variants, giving 1-6% perf E2E (ggml-org#15715) model-conversion : fix pyright errors (ggml-org#15770) sampling : optimize dist sampler (ggml-org#15704) llama : fix incorrect model type for Gemma 270M (ggml-org#15764) model-conversion : remove hardcoded /bin/bash shebangs [no ci] (ggml-org#15765) CANN: Add RoPE contiguous check for 310I DUP device (ggml-org#15735) ggml-cpu : optimize RVV kernels (ggml-org#15720) ...

* ggml-cpu : optimize rvv ggml_vec_dot_f32 * ggml-cpu : optimize 128-bit rvv ggml_vec_dot_q4_K_q8_K * ggml-cpu : fix riscv arch flags * ggml-cpu : add more rvv ops * ggml-cpu : optimize rvv ggml_vec_dot_q4_K_q8_K * ggml-cpu : optimize rvv ggml_vec_dot_q6_K_q8_K * ggml-cpu : minor rvv adjustments * ggml-cpu : fix riscv include

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 1, 2025

xctan added 8 commits September 2, 2025 13:57

ggml-cpu : optimize rvv ggml_vec_dot_f32

d092a22

ggml-cpu : optimize 128-bit rvv ggml_vec_dot_q4_K_q8_K

3bab1c9

ggml-cpu : fix riscv arch flags

3492e6b

ggml-cpu : add more rvv ops

20d2017

ggml-cpu : optimize rvv ggml_vec_dot_q4_K_q8_K

624b291

ggml-cpu : optimize rvv ggml_vec_dot_q6_K_q8_K

8f2a5af

ggml-cpu : minor rvv adjustments

68ecc10

ggml-cpu : fix riscv include

cf71ea6

xctan force-pushed the rvv-optim branch from 9febbc3 to cf71ea6 Compare September 2, 2025 07:04

xctan requested a review from ggerganov September 2, 2025 10:48

ggerganov approved these changes Sep 2, 2025

View reviewed changes

xctan merged commit 05c0380 into ggml-org:master Sep 3, 2025
48 checks passed

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 7, 2025

Revert "ggml-cpu : optimize RVV kernels (ggml-org#15720)"

7a6d6d6

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 26, 2025

Revert "ggml-cpu : optimize RVV kernels (ggml-org#15720)"

0ef88bc

xctan mentioned this pull request Oct 31, 2025

ggml-cpu : optimize RVV q2_k and q3_k kernels #16887

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu : optimize RVV kernels #15720

ggml-cpu : optimize RVV kernels #15720

xctan commented Sep 1, 2025

Uh oh!

xctan commented Sep 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggml-cpu : optimize RVV kernels #15720

ggml-cpu : optimize RVV kernels #15720

Conversation

xctan commented Sep 1, 2025

Uh oh!

xctan commented Sep 1, 2025

Performance

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants